dCITE: Measuring Necessary Cladistic Information Can Help You Reduce Polytomy Artefacts in Trees
نویسنده
چکیده
Biologists regularly create phylogenetic trees to better understand the evolutionary origins of their species of interest, and often use genomes as their data source. However, as more and more incomplete genomes are published, in many cases it may not be possible to compute genome-based phylogenetic trees due to large gaps in the assembled sequences. In addition, comparison of complete genomes may not even be desirable due to the presence of horizontally acquired and homologous genes. A decision must therefore be made about which gene, or gene combinations, should be used to compute a tree. Deflated Cladistic Information based on Total Entropy (dCITE) is proposed as an easily computed metric for measuring the cladistic information in multiple sequence alignments representing a range of taxa, without the need to first compute the corresponding trees. dCITE scores can be used to rank candidate genes or decide whether input sequences provide insufficient cladistic information, making artefactual polytomies more likely. The dCITE method can be applied to protein, nucleotide or encoded phenotypic data, so can be used to select which data-type is most appropriate, given the choice. In a series of experiments the dCITE method was compared with related measures. Then, as a practical demonstration, the ideas developed in the paper were applied to a dataset representing species from the order Campylobacterales; trees based on sequence combinations, selected on the basis of their dCITE scores, were compared with a tree constructed to mimic Multi-Locus Sequence Typing (MLST) combinations of fragments. We see that the greater the dCITE score the more likely it is that the computed phylogenetic tree will be free of artefactual polytomies. Secondly, cladistic information saturates, beyond which little additional cladistic information can be obtained by adding additional sequences. Finally, sequences with high cladistic information produce more consistent trees for the same taxa.
منابع مشابه
RadCon: phylogenetic tree comparison and consensus
SUMMARY RadCon is a Macintosh program for manipulating and analysing phylogenetic trees. The program can determine the Cladistic Information Content of individual trees, the stability of leaves across a set of bootstrap trees, produce the strict basic Reduced Cladistic Consensus profile of a set of trees and convert a set of trees into its matrix representation for supertree construction. AVA...
متن کاملNo substitute for real data: phylogenies from birth-death polytomy resolvers should not be used for many downstream comparative analyses
The statistical estimation of phylogenies is always associated with uncertainty, and accommodating this uncertainty is an important component of modern phylogenetic comparative analysis. The birth-death polytomy resolver is a method of accounting for phylogenetic uncertainty that places missing (unsampled) taxa onto phylogenetic trees, using taxonomic information alone. Recent studies of birds ...
متن کاملPrediction and Diagnosis of Diabetes Mellitus using a Water Wave Optimization Algorithm
Data mining is an appropriate way to discover information and hidden patterns in large amounts of data, where the hidden patterns cannot be easily discovered in normal ways. One of the most interesting applications of data mining is the discovery of diseases and disease patterns through investigating patients' records. Early diagnosis of diabetes can reduce the effects of this devastating disea...
متن کاملPolytomy refinement for the correction of dubious duplications in gene trees
MOTIVATION Large-scale methods for inferring gene trees are error-prone. Correcting gene trees for weakly supported features often results in non-binary trees, i.e. trees with polytomies, thus raising the natural question of refining such polytomies into binary trees. A feature pointing toward potential errors in gene trees are duplications that are not supported by the presence of multiple gen...
متن کاملTesting for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies
Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 11 شماره
صفحات -
تاریخ انتشار 2016